wasserstein distance and riemannian optimization
Projection Robust Wasserstein Distance and Riemannian Optimization
Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable.
Projection Robust Wasserstein Distance and Riemannian Optimization
Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by \citet{Niles-2019-Estimation} in a minimax sense, the original formulation for PRW/WPP \textit{can} be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data.
Review for NeurIPS paper: Projection Robust Wasserstein Distance and Riemannian Optimization
Summary and Contributions: The Wasserstein distance emerges from the optimal transport (OT) problem and is a powerful metric to compare two probability measures, since it offers nice theoretical properties and relevant practical implications. However, it has major limitations when applied in large-scale settings: since the Wasserstein distance is defined as the solution of a linear program, its computation becomes rapidly excessive as the dimension of the ambient data space increases; besides, its sample complexity can grow exponentially in the problem dimension. These unfavorable properties have motivated the development of "computational OT" methods in recent years, which define alternative to the Wasserstein distance with better computational and/or statistical properties, and therefore allow the use of OT in machine learning applications. One approach that was recently proposed and has become increasingly popular, consists in computing the Wasserstein distance between lower-dimensional representations for the two distributions to compare. Specifically, the Projection Robust Wasserstein (PRW) distance (also known as Wasserstein Projection Pursuit) builds the representations by projecting orthogonally the d-dimensional distributions into the k-dimensional subspace (k d) such that the Wasserstein distance between these k-dimensional reductions is maximized.
Projection Robust Wasserstein Distance and Riemannian Optimization
Projection robust Wasserstein (PRW) distance, or Wasserstein projection pursuit (WPP), is a robust variant of the Wasserstein distance. Recent work suggests that this quantity is more robust than the standard Wasserstein distance, in particular when comparing probability measures in high-dimensions. However, it is ruled out for practical application because the optimization model is essentially non-convex and non-smooth which makes the computation intractable. Our contribution in this paper is to revisit the original motivation behind WPP/PRW, but take the hard route of showing that, despite its non-convexity and lack of nonsmoothness, and even despite some hardness results proved by \citet{Niles-2019-Estimation} in a minimax sense, the original formulation for PRW/WPP \textit{can} be efficiently computed in practice using Riemannian optimization, yielding in relevant cases better behavior than its convex relaxation. More specifically, we provide three simple algorithms with solid theoretical guarantee on their complexity bound (one in the appendix), and demonstrate their effectiveness and efficiency by conducing extensive experiments on synthetic and real data.